Add raw search fast path with code domain support by michael-schvarcz · Pull Request #174 · XortexAI/XMem

michael-schvarcz · 2026-05-11T18:38:22Z

Addresses #163.

This adds a low-latency raw search path that returns ranked memory hits without the agentic LLM planning/synthesis round, while keeping answer generation optional through answer=true.

What changed:

Adds RetrievalPipeline.raw_search for direct profile, temporal, summary, and snippet retrieval without tool selection.
Extends /v1/memory/search to optionally include code hits by querying the existing code retrieval pipeline's native symbol/file search tools.
Adds answer=true synthesis after raw hits are collected, without a second planning step.
Caches profile catalogs and agentic retrieval plans.
Tracks bounded p50/p95/p99 latency snapshots by retrieval mode.
Adds tests proving raw search bypasses LLM tool planning, answer mode only synthesizes after hits, and cached plans are reused.

Compared with the currently open #173, this implementation also wires the requested code domain into the raw search endpoint.

Validation:

.venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -q
9 passed

git diff --check
passed

gemini-code-assist

Code Review

This PR expands the memory search capabilities by adding a "code" domain for repository-level searches and introducing a fast raw_search path that avoids LLM planning. Key architectural additions include a latency tracking system for performance monitoring and caching mechanisms for retrieval plans and profile catalogs. The review feedback highlights several critical improvements: addressing a potential TypeError when rounding null scores, implementing bounded caches to prevent memory leaks, maintaining proper encapsulation of private methods, and enhancing the robustness of concurrent tool execution through better error handling.

gemini-code-assist · 2026-05-11T18:40:09Z

+                SourceRecord(
+                    domain=s.domain,
+                    content=s.content,
+                    score=round(s.score, 3),


The score attribute of a SourceRecord can be None (as handled in the sorting logic on line 728). Attempting to call round(None, 3) will raise a TypeError. It is safer to provide a default value of 0.0 if the score is missing.

Suggested change

score=round(s.score, 3),

score=round(s.score, 3) if s.score is not None else 0.0,

gemini-code-assist · 2026-05-11T18:40:09Z

+        self._profile_catalog_cache: Dict[
+            str, Tuple[float, List[Dict[str, str]], List[Any]]
+        ] = {}
+        self._retrieval_plan_cache: Dict[str, List[Dict[str, Any]]] = {}


The _profile_catalog_cache and _retrieval_plan_cache are implemented as unbounded dictionaries. In a long-running API process, these will grow indefinitely as different users and queries are processed, potentially leading to memory exhaustion. It is recommended to use a cache with a maximum size or an eviction policy (e.g., cachetools.TTLCache or a simple LRU implementation).

gemini-code-assist · 2026-05-11T18:40:09Z

-        data = SearchResponse(results=all_results, total=len(all_results))
+        answer = ""
+        if req.answer:
+            answer = await pipeline._answer_from_sources(query=req.query, sources=records)


This line accesses _answer_from_sources, which is a private method of the RetrievalPipeline class. To maintain proper encapsulation, this method should be made public (e.g., renamed to answer_from_sources) if it is intended to be used outside the class.

gemini-code-assist · 2026-05-11T18:40:09Z

+        if include_answer:
+            answer = await self._answer_from_sources(query=query, sources=sources)
+
+        confidence = min(1.0, len(sources) * 0.15) if sources else 0.1


The confidence calculation logic here (len(sources) * 0.15) is inconsistent with the logic used in the agentic run method on line 308 (len(sources) * 0.2). Consider unifying this logic or defining the multiplier as a constant to ensure consistent confidence scoring across different retrieval modes.

gemini-code-assist · 2026-05-11T18:40:09Z

+            code_results = await asyncio.gather(
+                code_pipeline._execute_tool(
+                    tool_name="search_symbols",
+                    tool_args={"query": req.query, "repo": req.repo},
+                    repo=req.repo,
+                    top_k=req.top_k,
+                    user_id=user_id,
+                ),
+                code_pipeline._execute_tool(
+                    tool_name="search_files",
+                    tool_args={"query": req.query, "repo": req.repo},
+                    repo=req.repo,
+                    top_k=req.top_k,
+                    user_id=user_id,
+                ),
+            )


If any of the code retrieval tool calls fail, the entire search request will return a 500 error. For a more robust search experience, consider using asyncio.gather(..., return_exceptions=True) and filtering out failed results, allowing the user to see memory hits even if the code domain search is temporarily unavailable.

michael-schvarcz · 2026-05-12T13:46:32Z

Updated this PR to address the review points:

Normalizes missing source scores through a shared helper before API serialization.
Keeps raw memory search resilient when one requested backend fails by using asyncio.gather(..., return_exceptions=True) and returning healthy domain results.
The current head already uses bounded LRU-style caches, public answer_from_sources / confidence_from_sources helpers, and the shared confidence multiplier.

Validation: .venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -> 9 passed.

Ankit-Kotnala · 2026-05-13T12:53:42Z

I rechecked the Gemini items and they look addressed. One remaining edge case I noticed is in src/pipelines/retrieval.py::_format_tool_results: API serialization now normalizes None scores, but answer=true can still fail before serialization because answer_from_sources() formats sources via _format_tool_results(), which does rec.score > 0 / {rec.score:.2f}.

If any backend returns SourceRecord(score=None), raw search without answer works, but raw search with answer=true can raise there. Can we normalize the score inside _format_tool_results() too, e.g. score = rec.score or 0.0, before formatting?

After that, this looks good to me.

Ankit-Kotnala

Overall this looks good and the Gemini review items appear addressed.

I left one blocking edge-case comment around _format_tool_results(): answer=true can still fail if any backend returns SourceRecord(score=None), because the answer formatting path uses rec.score > 0 / {rec.score:.2f} before API serialization normalizes scores.

Once that score normalization is handled in _format_tool_results(), I’m happy with this PR.

michael-schvarcz · 2026-05-13T13:11:12Z

Fixed the remaining missing-score edge case in _format_tool_results() by normalizing None to 0.0 before score comparison/formatting, so answer=true no longer crashes before API serialization.

Added regression coverage for answer_from_sources() with SourceRecord(score=None).

Validation:

.venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -q
# 10 passed
git diff --check

nagiexplorer88 · 2026-05-14T01:07:17Z

One score-hardening gap still remains after 89fe3e4. _score_value() only maps None to 0.0:

def _score_value(score: float | None) -> float:
    return round(score, 3) if score is not None else 0.0

If any vector/code backend returns a non-finite float (nan or inf), round(score, 3) preserves that non-finite value. The /v1/memory/search path then places it into SearchResponse.results[*].score, and _wrap() passes body.model_dump() into JSONResponse. Starlette JSON responses reject non-finite floats, so raw search can still fail at response rendering even though the missing-score case is covered.

This also affects retrieve_memory() because it uses the same helper for result.sources. A small float(...) + math.isfinite(...) coercion before rounding would close both paths.

Ankit-Kotnala

Thanks, this addresses my concern. _format_tool_results() now safely handles missing source scores before formatting, and the added regression test covers answer_from_sources() with score=None.

Looks good to merge from my side.

@ishaanxgupta you can merge this if there aren't any concerns on your side.

greptile-apps · 2026-05-23T09:25:13Z

Greptile Summary

This PR adds a low-latency raw_search fast path to RetrievalPipeline that retrieves ranked memory hits across profile, temporal, summary, and snippet domains without an LLM planning step, and extends POST /v1/memory/search with optional code-domain support (via the existing CodeRetrievalPipeline) and optional answer synthesis (answer=true).

RetrievalPipeline.raw_search: gathers results from all requested domains in parallel, ranks by score, and optionally calls answer_from_sources — bypassing the two-step agentic plan entirely.
Caching: _profile_catalog_cache (LRU + 5 min TTL) avoids repeated Pinecone metadata scans; _retrieval_plan_cache (LRU, no TTL) reuses LLM tool-selection plans for identical query+catalog pairs.
search_memory route rewrite: merges memory and code results, applies _score_value normalization, records per-mode latency, and returns SearchResponse enriched with answer, confidence, and latency fields.

Confidence Score: 3/5

Safe to merge with low risk, but several small correctness issues should be addressed before the latency and sort paths are relied upon in production.

The core raw_search logic and code-domain wiring look correct and are well-tested. Four issues reduce confidence: the singleton pipeline's global latency snapshot is returned in every SearchResponse (leaking system-wide telemetry to any API caller); the retrieval plan cache has no TTL, so stale plans can persist across model upgrades as long as the catalog text doesn't change; _rank_sources and the route-level sort both pass NaN/Inf scores unchanged to Python's sort (which produces undefined ordering for NaN keys); and the strip_search_query validator runs after the min_length constraint, allowing a whitespace-only query to arrive at vector stores as an empty string.

src/pipelines/retrieval.py (_rank_sources, _cache_retrieval_plan) and src/api/routes/memory.py (sort call, latency exposure) deserve a second look before this ships to production.

Important Files Changed

Filename	Overview
src/pipelines/retrieval.py	Adds raw_search fast path, profile catalog cache (5 min TTL), LRU retrieval plan cache (no TTL), latency tracker, and helpers (confidence_from_sources, answer_from_sources, _rank_sources). Plan cache lacks TTL and _rank_sources doesn't sanitize non-finite scores.
src/api/routes/memory.py	search_memory rewritten to use raw_search + optional code domain search + optional answer synthesis. Adds _score_value helper. Exposes singleton pipeline's global latency snapshot in every SearchResponse.
src/api/schemas.py	SearchRequest gains answer, org_id, repo fields and a strip_search_query validator; SearchResponse gains answer, model, confidence, latency fields. Validator can produce empty string after stripping whitespace-only input, bypassing min_length=1.
tests/integration/test_retrieval_pipeline.py	Adds four new integration tests covering raw_search ranking, answer synthesis, non-finite score handling, and plan cache reuse. Tests are well-structured and cover the key new paths.

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchRoute as POST /v1/memory/search
    participant MemPipeline as RetrievalPipeline
    participant CodePipeline as CodeRetrievalPipeline
    participant LLM

    Client->>SearchRoute: "{query, domains, top_k, answer, org_id, repo}"
    alt "code" in domains AND no org_id
        SearchRoute-->>Client: 400 org_id required
    end

    SearchRoute->>MemPipeline: "raw_search(domains-code, include_answer=False)"
    Note over MemPipeline: profile from cache (5 min TTL)<br/>temporal, summary, snippet in parallel
    MemPipeline-->>SearchRoute: RetrievalResult.sources

    opt "code" in domains
        SearchRoute->>CodePipeline: _execute_tool(search_symbols) [parallel]
        SearchRoute->>CodePipeline: _execute_tool(search_files) [parallel]
        CodePipeline-->>SearchRoute: code SourceRecords
    end

    SearchRoute->>SearchRoute: sort all records by score desc

    opt "req.answer == True"
        SearchRoute->>MemPipeline: answer_from_sources(query, records)
        MemPipeline->>LLM: ainvoke(ANSWER_PROMPT)
        LLM-->>MemPipeline: answer text
        MemPipeline-->>SearchRoute: answer
    end

    SearchRoute-->>Client: "SearchResponse{results, total, answer, model, confidence, latency}"

_{Reviews (1): Last reviewed commit: "Normalize non-finite source scores" | Re-trigger Greptile}

greptile-apps · 2026-05-23T09:25:19Z

Comments Outside Diff (5)

src/api/routes/memory.py, line 760-776 (link)

Cross-user latency telemetry in API response. pipeline is a singleton shared across all users, so latency_snapshot() returns aggregate p50/p95/p99 statistics accumulated from every request by every user. Including this in the per-request SearchResponse leaks system-wide load patterns to any API caller — e.g., a client could fingerprint how busy the service is based on changing percentiles.
src/pipelines/retrieval.py, line 413-421 (link)

Retrieval plan cache has no TTL. Unlike _profile_catalog_cache (300 s expiry), entries in _retrieval_plan_cache survive until LRU eviction. If the underlying model is swapped or tool schemas change between deployments, repeated identical queries (same user + query + catalog text) will keep executing the old tool plan indefinitely without ever re-asking the LLM.
src/pipelines/retrieval.py, line 434-439 (link)

Non-finite scores pass through the sort key unchanged. float('nan') is truthy in Python, so source.score if source.score is not None else 0.0 keeps nan intact. Python's TimSort produces implementation-defined (and potentially wrong) ordering when keys contain nan. float('inf') from code search results would always sort first. The _finite_score helper already exists for exactly this case.
src/api/routes/memory.py, line 745 (link)

The route-level sort uses s.score or 0.0, which does not sanitize float('nan') — nan is truthy, so nan or 0.0 returns nan. This is the same ordering hazard as in _rank_sources, and only affects code-search results since memory results are already sorted by _rank_sources before being merged here.
src/api/schemas.py, line 191-194 (link)

The strip_search_query validator runs after Pydantic's built-in constraint checks (including min_length=1), so a query of " " (one space) passes the length check and is then reduced to "". An empty string then reaches vector store embedding calls, which may error or return semantically useless results. The fix is to strip before the length constraint fires.

michael-schvarcz requested review from ishaanxgupta and ved015 as code owners May 11, 2026 18:38

github-actions Bot added tests api pipelines labels May 11, 2026

michael-schvarcz mentioned this pull request May 11, 2026

Add low-latency raw search path separate from agentic answer synthesis #163

Open

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

Add raw memory search fast path

bb08f07

michael-schvarcz force-pushed the codex/raw-search-fast-path branch from 8aa3426 to bb08f07 Compare May 11, 2026 18:43

Harden raw search score handling

d2df805

Ankit-Kotnala reviewed May 13, 2026

View reviewed changes

Ankit-Kotnala requested changes May 13, 2026

View reviewed changes

Handle missing source scores in answer formatting

89fe3e4

michael-schvarcz requested a review from Ankit-Kotnala May 13, 2026 13:11

Ankit-Kotnala reviewed May 14, 2026

View reviewed changes

Normalize non-finite source scores

a63bd1e

	score=round(s.score, 3),
	score=round(s.score, 3) if s.score is not None else 0.0,

Conversation

michael-schvarcz commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

michael-schvarcz commented May 12, 2026

Uh oh!

Ankit-Kotnala May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Ankit-Kotnala left a comment

Choose a reason for hiding this comment

Uh oh!

michael-schvarcz commented May 13, 2026

Uh oh!

nagiexplorer88 commented May 14, 2026

Uh oh!

Ankit-Kotnala left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 23, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot commented May 23, 2026

Comments Outside Diff (5)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants